摘要 :
Nowadays Information Systems generate a lot of data for supporting the activities of firms, organisations, and state agencies. While on the one hand such data are primarily collected for realising domain-specific services (e.g., s...
展开
Nowadays Information Systems generate a lot of data for supporting the activities of firms, organisations, and state agencies. While on the one hand such data are primarily collected for realising domain-specific services (e.g., state agencies use data for managing healthcare and retirement contributions) on the other hand domain analysts aim at using such data for studying the dynamics of subjects’ behaviours or phenomena over time. Thus, the quality of data plays a key role in ensuring the effectiveness of the overall knowledge discovery process. In such a context, most of the research on data quality is aimed at automatically identifying cleansing activities, namely a sequence of actions able to cleanse a dirty dataset, which are often developed and coded manually requiring a relevant effort for domain-experts. This work is concerned with using AI Planning for both modelling data quality requirements and automatically identifying cleansing activities. To this end, we formalise the concept of cost-optimal Universal Cleanser - a collection of best cleansing actions for each data inconsistency identified - as a planning problem, then we present a motivating government application where data have been cleansed accordingly, making both the source and cleansed datasets publicly available for download.
收起
摘要 :
Data Lake is a new concept of approaching and analyzing large volumes of different types of data, emerging with the evolution of technology and the new generation that came with new requirements, multiple resources and media infor...
展开
Data Lake is a new concept of approaching and analyzing large volumes of different types of data, emerging with the evolution of technology and the new generation that came with new requirements, multiple resources and media information. In this paper, we will present the new Data Lake concept, highlighting the latest developments in the field. We also perform a critical analysis of the advantages and disadvantages currently offered by Data Warehouses, and at the end, comparing the two concepts we argue the answer to the question if Data Lake will replace Data Warehouse in the near future. In this context, our main contribution refers to a qualitative and comparative study on Data Lake and Data Warehouse, highlighting the advantages and improvements Data Lake brings to the storage of large data volumes.
收起
摘要 :
Influenza viruses cause annual winter epidemics in temperate regions, with significant morbidity, mortality and economical impact. Fluarix® is a split, trivalent, inactivated vaccine, manufactured from highly purified, egg-grown ...
展开
Influenza viruses cause annual winter epidemics in temperate regions, with significant morbidity, mortality and economical impact. Fluarix® is a split, trivalent, inactivated vaccine, manufactured from highly purified, egg-grown influenza viruses by GlaxoSmithKline. In 2005, Fluarix underwent accelerated approval for use in adults by the US FDA following a US-based, randomized, placebo-controlled trial that established its safety and immunogenicity in adults. The vaccine has been licensed in Europe since 1992 for all age groups. Multiple registration trials in all age groups in Europe have demonstrated that the vaccine was safe and well tolerated and of immunogenicity standards that met the requirements of the European Committee for Medicinal Products for Human Use. There are no published clinical trials evaluating the effectiveness or efficacy of Fluarix against influenza and its complications. Currently, Fluarix plays an important role in the diversification of the supply chain of influenza vaccine to the community. However, vaccines with improved immunogenicity in at-risk populations, such as the elderly, and with less reliance on growth in eggs, as well as the inherent demanding timelines, are needed to enhance the control of influenza.
收起
摘要 :
This article presents a new theory on the nature of turbulence: when the Reynolds number is large, violent fully developed turbulence is due to "rough dependence on initial data" rather than chaos which is caused by "sensitive dep...
展开
This article presents a new theory on the nature of turbulence: when the Reynolds number is large, violent fully developed turbulence is due to "rough dependence on initial data" rather than chaos which is caused by "sensitive dependence on initial data"; when the Reynolds number is moderate, (often transient) turbulence is due to chaos. The key in the validation of the theory is estimating the temporal growth of the initial perturbations with the Reynolds number as a parameter. Analytically, this amounts to estimating the temporal growth of the norm of the derivative of the solution map of the Navier-Stokes equations, for which here I obtain an upper bound . This bound clearly indicates that when the Reynolds number is large, the temporal growth rate can potentially be large in short time, i.e. rough dependence on initial data.
收起
摘要 :
Agriculture has always been a sector with several specificities that call for adjusted interventions from public institutions through agricultural policies. This is not an exception for the context in the European Union where the ...
展开
Agriculture has always been a sector with several specificities that call for adjusted interventions from public institutions through agricultural policies. This is not an exception for the context in the European Union where the Common Agricultural Policy has had more impact in some contexts than the national agricultural policies of the member-states. In turn, the profit margins are, in general, narrow and this needs specific financial and economic management. However, the financial, economic instruments, and indicators for farming are, often, ignored, or at least, not sufficiently analysed. From this perspective, the main objective of this study is to assess the net working capital framework across European Union countries and regions, including assessments through types of farming and economic size. Another objective is to analyse the impacts from financial indicators current ratio, current assets-to-total assets ratio, current liabilities-to-total assets ratio, and debt-to-total assets ratio on profitability return on assets and financial performance return on equity . For this purpose, data from the Farm Accountancy Data Network were considered, for the period 2004-2018. These data were worked through descriptive analysis, spatial autocorrelation approaches, and panel data regressions. As main conclusions, it is worth noting the diversity of financial realities across the European farming sector and the null impacts from the liquidity ratio on the farms’ performance.
收起
摘要 :
This study aims to determine whether Twitter is a social media platform that supports the One Data Indonesia Interoperability process to implement data regulations by producing accurate, integrated and accountable data. Use of Soc...
展开
This study aims to determine whether Twitter is a social media platform that supports the One Data Indonesia Interoperability process to implement data regulations by producing accurate, integrated and accountable data. Use of Social media in Participation, Transparency, Discussion, involvement, and Communication Strategy. This descriptive research uses NVivo 12 Plus with data sources from Twitter @diklat BIG, @datagoid, @bappenasRI, and @Jokowi via NCapture. NCapture data and use the automatically configured crosstab feature. This study shows that social media significantly drives data interoperability strategies in Indonesia. Communication indicators from Participation in previous attempts were 22.00% and 3236 referrals. The 21.00% transparency category includes clear objectives, processes and program support. They completed 20.00% of the various stages of data collection, verification, and data priority release. To socialize the implementation of One Data Indonesia and the action to accelerate the provision of the One Data Portal, tweets were used with an involvement proportion of 18.00%. Communication Strategy realizes the importance of data and the existence of One Data Indonesia by using the Conversation Indicator, which has a Participation proportion of 17.00%. This study is essential for analyzing social media assistance using Twitter-based activity support and providing new, up-to-date data with NVivo 12 Plus analysis.
收起
摘要 :
Based on large data analysis method and automatic detection technology, this paper designs a test system, which can realize intelligent online monitoring of seawater. Based on the theory of large data, the data preprocessing metho...
展开
Based on large data analysis method and automatic detection technology, this paper designs a test system, which can realize intelligent online monitoring of seawater. Based on the theory of large data, the data preprocessing method of large data is applied by relying on the information transmitted by integrated sensors. Using data cleaning, data integration, data conversion and data reduction technology, a large number of data collected by marine monitoring devices are processed accurately. An automatic seawater monitoring system is designed on a software platform. Finally, combined with the experimental data of a certain sea area, the test results are analyzed, which proves the feasibility and effectiveness of the designed seawater online monitoring system. It has achieved the effect of seawater environmental analysis and early warning.
收起
摘要 :
Abstract This paper presents a framework for discovering similar users on Twitter that can be used in profiling users for social, recruitment and security reasons. The framework contains a novel formula that calculates the similar...
展开
Abstract This paper presents a framework for discovering similar users on Twitter that can be used in profiling users for social, recruitment and security reasons. The framework contains a novel formula that calculates the similarity between users on Twitter by using seven different signals (features). The signals are followings and followers, mention, retweet, favorite, common hashtag, common interests, and profile similarity. The proposed framework is scalable and can handle big data because it is implemented using the MapReduce paradigm. It is also adjustable since the weight and contribution of each signal in calculating the final similarity score is determined by the user based on their needs. The accuracy of the system was evaluated through human judges and by comparing the system’s results against Twitter’s Who To Follow service. The results show moderately accurate results.
收起
摘要 :
On-the-fly data integration, i.e. at query time, happens mostly in tightly coupled, homogeneous environments where the partitioning of the data can be controlled or is known in advance. During the process of data fusion, the infor...
展开
On-the-fly data integration, i.e. at query time, happens mostly in tightly coupled, homogeneous environments where the partitioning of the data can be controlled or is known in advance. During the process of data fusion, the information is homogenized and data inconsistencies are hidden from the application. Beyond this, we propose in this paper the Nexus metadata model and a processing approach that support on-the-fly data integration in a loosely coupled federation of autonomous data providers, thereby advancing the status quo in terms of flexibility and expressive power. It is able to represent data and schema inconsistencies like multi-valued attributes and multi-typed objects. In an open environment, this best suites the application needs where the data processing infrastructure is not able to decide which attribute value is correct. The Nexus metadata model provides the foundation for integration schemata that are specific to a given application domain. The corresponding processing model provides four complementary query semantics in order to account for the subtleties of multi-valued and missing attributes. In this paper we show that this query semantics is sound, easy to implement, and it builds upon existing query processing techniques. Thus the Nexus metadata model provides a unique level of flexibility for on-the-fly data integration.
收起
摘要 :
We present the first solution to tau-majorities on tree paths. Given a tree of n nodes, each with a label from [1..sigma], and a fixed threshold 0 1, we can also build a structure that uses O(n lg^{[kappa]} n) space, where lg^{[ka...
展开
We present the first solution to tau-majorities on tree paths. Given a tree of n nodes, each with a label from [1..sigma], and a fixed threshold 0 1, we can also build a structure that uses O(n lg^{[kappa]} n) space, where lg^{[kappa]} n denotes the function that applies logarithm kappa times to n, and answers queries in time O((1/tau)lg lg_w sigma). The construction time of both structures is O(n lg n). We also describe two succinct-space solutions with the same query time of the linear-space structure. One uses 2nH + 4n + o(n)(H+1) bits, where H <=lg sigma is the entropy of the label distribution, and can be built in O(n lg n) time. The other uses nH + O(n) + o(nH) bits and is built in O(n lg n) time w.h.p.
收起